Search CORE

2 research outputs found

Circular sequence comparison: algorithms and applications

Author: Ahmad Retha (7168871)
Costas S. Iliopoulos (7168862)
Fatima Vayani (7168874)
Nadia Pisanti (7168865)
Robert Mercas (2835212)
Roberto Grossi (7168859)
Solon P. Pissis (7168868)
Publication venue
Publication date: 01/01/2016
Field of study

Background: Sequence comparison is a fundamental step in many important tasks in bioinformatics; from phylogenetic reconstruction to the reconstruction of genomes. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. As circular molecular structure is a common phenomenon in nature, a caveat of the adaptation of alignment techniques for circular sequence comparison is that they are computationally expensive, requiring from super-quadratic to cubic time in the length of the sequences. Results: In this paper, we introduce a new distance measure based on q-grams, and show how it can be applied effectively and computed efficiently for circular sequence comparison. Experimental results, using real DNA, RNA, and protein sequences as well as synthetic data, demonstrate orders-of-magnitude superiority of our approach in terms of efficiency, while maintaining an accuracy very competitive to the state of the art

Loughborough University Institutional Repository

Asymptotically Optimal Encodings of Range Data Structures for Selection and Top-k Queries

Author: Gonzalo Navarro (2933529)
John Iacono (7692215)
Rajeev Raman (234139)
Roberto Grossi (7168859)
S. Rao Satti (7768412)
Publication venue
Publication date: 01/01/2017
Field of study

Given an array A[1, n] of elements with a total order, we consider the problem of building a data structure that solves two queries: (a) selection queries receive a range [i, j] and an integer k and return the position of the kth largest element in A[i, j]; (b) top-k queries receive [i, j] and k and return the positions of the k largest elements in A[i, j]. These problems can be solved in optimal time, O(1 + lg k/ lg lg n) and O(k), respectively, using linear-space data structures. We provide the first study of the encoding data structures for the above problems, where A cannot be accessed at query time. Several applications are interested in the relative order of the entries of A, and their positions, rather their actual values, and thus we do not need to keep A at query time. In those cases, encodings save storage space: we first show that any encoding answering such queries requires n lg k − O(n + k lg k) bits of space; then, we design encodings using O(n lg k) bits, that is, asymptotically optimal up to constant factors, while preserving optimal query time

Archivio della Ricerca - Università di Pisa

DI-fusion

Repositorio Académico de la Universidad de Chile

Leicester Research Archive